13 research outputs found
Privacy-Aware and Secure Decentralized Air Quality Monitoring
Indoor Air Quality monitoring is a major asset to improving quality of life and building management. Today, the evolution of embedded technologies allows the implementation of such monitoring on the edge of the network. However, several concerns need to be addressed related to data security and privacy, routing and sink placement optimization, protection from external monitoring, and distributed data mining. In this paper, we describe an integrated framework that features distributed storage, blockchain-based Role-based Access Control, onion routing, routing and sink placement optimization, and distributed data mining to answer these concerns. We describe the organization of our contribution and show its relevance with simulations and experiments over a set of use cases
Using words from daily news headlines to predict the movement of stock market indices
Stock market analysis is one of the biggest areas of interest for text mining.
Many researchers proposed different approaches that use text information
for predicting the movement of stock market indices. Many of these approaches
focus either on maximising the predictive accuracy of the model
or on devising alternative methods for model evaluation. In this paper,
we propose a more descriptive approach focusing on the models themselves,
trying to identify the individual words in the text that most affect
the movement of stock market indices. We use data from two sources (for
the past eight years): the daily data for the Dow Jones Industrial Average
index (‘open’ and ‘close’ values for each trading day) and the headlines of
the most voted 25 news on the Reddit World News Channel for the previous
‘trading days.’ By applying machine learning algorithms on these data
and analysing individual words that appear in the final predictive models,
we find that the words gay, propaganda and massacre are typically associated
with a daily increase of the stock index, while the word IRAN mostly
coincide with its decrease. While this work presents a first step towards
qualitative analysis of stock market models, there is still plenty of room for
improvements
Using rule learning for subgroup discovery
This dissertation investigates how to adapt standard classification rule
learning approaches to subgroup discovery. The goal of subgroup
discovery is to find rules describing subsets of a selected population
that are sufficiently large and statistically unusual in terms of class
distribution. The dissertation presents a subgroup discovery algorithm,
CN2-SD, developed by modifying parts of the CN2 classification rule
learner: its covering algorithm, search heuristic, probabilistic
classification of instances, and evaluation measures. Experimental
evaluation of CN2-SD on selected data sets shows substantial reduction
of the number of induced rules, increased rule coverage, rule
significance and overall coverage of the target concept as well as
slight improvements in terms of the area under ROC curve, when compared
with rule learning algorithms CN2 and RIPPER. An application of CN2-SD
to a large traffic accident data set confirms these findings.
This dissertation presents also the subgroup discovery algorithm
APRIORI-SD, developed by adapting association rule learning to subgroup
discovery. This was achieved by building a classification rule learner
APRIORI-C, enhanced with a novel post–processing mechanism, a new
quality measure for induced rules (weighted relative accuracy) and using
probabilistic classification of instances. Experimental results a
similar behavior of APRIORI-SD and the subgroup discovery algorithm
CN2-SD i.e. substantial reduction of the number of induced rules,
increased rule coverage, rule significance and overall coverage of the
target concept as well as slight improvements in terms of the area under
ROC curve, when compared with rule learning algorithms CN2, RIPPER and
APRIORI-C.
A new optimization approach to subgroup discovery based on ROC analysis
is also presented and implemented as an adaptation of the APRIORI-SD
algorithm. The implications of the
“number-of-rules–unusualness–coverage” trade off to subgroup discovery
are investigated through an experimental evaluation of the adapted
APRIORI-SD algorithm on selected data sets. The results are presented in
the form of 2D graphs depicting the dependencies between the number of
induced rules, unusualness, accuracy and overall coverage of the target
concept and the original APRIORI-SD subgroup discovery algorithm is
discussed in this new optimization framework.
Finally, the dissertation presents the comparison of the new algorithms
with existing state–of–the–art subgroup discovery algorithms and the
application of CN2-SD and APRIORI-SD to a real–life problem – the
traffic accident database – a database describing traffic accidents in
Great Britain
Coverage-based classification using association rule mining
Building accurate and compact classifiers in real-world applications is one of the crucial tasks in data mining nowadays. In this paper, we propose a new method that can reduce the number of class association rules produced by classical class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose a new associative classifier that selects “strong” class association rules based on overall coverage of the learning set. The advantage of the proposed classifier is that it generates significantly smaller rules on bigger datasets compared to traditional classifiers while maintaining the classification accuracy. We also discuss how the overall coverage of such classifiers affects their classification accuracy. Performed experiments measuring classification accuracy, number of classification rules and other relevance measures such as precision, recall and f-measure on 12 real-life datasets from the UCI ML repository (Dua, D.; Graff, C. UCI Machine Learning Repository. Irvine, CA: University of California, 2019) show that our method was comparable to 8 other well-known rule-based classification algorithms. It achieved the second-highest average accuracy (84.9%) and the best result in terms of average number of rules among all classification methods. Although not achieving the best results in terms of classification accuracy, our method proved to be producing compact and understandable classifiers by exhaustively searching the entire example space
Privacy-Preserving Data Mining on Blockchain-Based WSNs
Currently, the computational power present in the sensors forming a wireless sensor network (WSN) allows for implementing most of the data processing and analysis directly on the sensors in a decentralized way. This shift in paradigm introduces a shift in the privacy and security problems that need to be addressed. While a decentralized implementation avoids the single point of failure problem that typically applies to centralized approaches, it is subject to other threats, such as external monitoring, and new challenges, such as the complexity of providing decentralized implementations for data mining algorithms. In this paper, we present a solution for privacy-aware distributed data mining on wireless sensor networks. Our solution uses a permissioned blockchain to avoid a single point of failure in the system. Contracts are used to construct an onion-like structure encompassing the Hoeffding trees and a route. The onion-routed query conceals the network identity of the sensors from external adversaries, and obfuscates the actual computation to hide it from internally compromised nodes. We validate our solution on a use case related to an air quality-monitoring sensor network. We compare the quality of our model against traditional models to support the feasibility and viability of the solution
Privacy-Preserving Data Mining on Blockchain-Based WSNs
Currently, the computational power present in the sensors forming a wireless sensor network (WSN) allows for implementing most of the data processing and analysis directly on the sensors in a decentralized way. This shift in paradigm introduces a shift in the privacy and security problems that need to be addressed. While a decentralized implementation avoids the single point of failure problem that typically applies to centralized approaches, it is subject to other threats, such as external monitoring, and new challenges, such as the complexity of providing decentralized implementations for data mining algorithms. In this paper, we present a solution for privacy-aware distributed data mining on wireless sensor networks. Our solution uses a permissioned blockchain to avoid a single point of failure in the system. Contracts are used to construct an onion-like structure encompassing the Hoeffding trees and a route. The onion-routed query conceals the network identity of the sensors from external adversaries, and obfuscates the actual computation to hide it from internally compromised nodes. We validate our solution on a use case related to an air quality-monitoring sensor network. We compare the quality of our model against traditional models to support the feasibility and viability of the solution